Equitability, interval estimation, and statistical power

نویسندگان

  • Yakir Reshef
  • David N. Reshef
  • Pardis Sabeti
  • Michael Mitzenmacher
چکیده

As data sets grow in dimensionality, non-parametric measures of dependence have seen increasing use in data exploration due to their ability to identify non-trivial relationships of all kinds. One common use of these tools is to test a null hypothesis of statistical independence on all variable pairs in a data set. However, because this approach attempts to identify any non-trivial relationship no matter how weak, it is prone to identifying so many relationships — even after correction for multiple hypothesis testing — that meaningful follow-up of each one is impossible. What is needed is a way of identifying a smaller set of “strongest” relationships of all kinds that merit detailed further analysis. Here we formally present and characterize equitability, a property of measures of dependence that aims to overcome this challenge. Notionally, an equitable statistic is a statistic that, given some measure of noise, assigns similar scores to equally noisy relationships of different types (e.g., linear, exponential, etc.) [1]. We begin by formalizing this idea via a new object called the interpretable interval, which functions as an interval estimate of the amount of noise in a relationship of unknown type. We define an equitable statistic as one with small interpretable intervals. We then draw on the equivalence of interval estimation and hypothesis testing to show that under moderate assumptions an equitable statistic is one that yields well powered tests for distinguishing not only between trivial and non-trivial relationships of all kinds but also between non-trivial relationships of different strengths, regardless of relationship type. This means that equitability allows us to specify a threshold relationship strength x0 below which we are uninterested, and to search a data set for relationships of all kinds with strength greater than x0. Thus, equitability can be thought of as a strengthening of power against independence that enables fruitful analysis of data sets with a small number of strong, interesting relationships and a large number of weaker, less interesting ones. We conclude with a demonstration of how our two equivalent characterizations of equitability can be used to evaluate the equitability of a statistic in practice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Theoretical Foundations of Equitability and the Maximal Information Coefficient

The maximal information coefficient (MIC) is a tool for finding the strongest pairwise relationships in a data set with many variables [1]. MIC is useful because it gives similar scores to equally noisy relationships of different types. This property, called equitability, is important for analyzing high-dimensional data sets. Here we formalize the theory behind both equitability and MIC in the ...

متن کامل

Equitability, mutual information, and the maximal information coefficient.

How should one quantify the strength of association between two random variables without bias for relationships of a specific form? Despite its conceptual simplicity, this notion of statistical "equitability" has yet to receive a definitive mathematical formalization. Here we argue that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequa...

متن کامل

An Empirical Study of Leading Measures of Dependence

In exploratory data analysis, we are often interested in identifying promising pairwise associations for further analysis while filtering out weaker, less interesting ones. This can be accomplished by computing a measure of dependence on all possible variable pairs and examining the highest-scoring pairs, provided the measure of dependence used assigns similar scores to equally noisy relationsh...

متن کامل

An Empirical Study of the Maximal and Total Information Coefficients and Leading Measures of Dependence

In exploratory data analysis, we are often interested in identifying promising pairwise associations for further analysis while filtering out weaker ones. This can be accomplished by computing a measure of dependence on all variable pairs and examining the highest-scoring pairs, provided the measure of dependence used assigns similar scores to equally noisy relationships of different types. Thi...

متن کامل

Cleaning up the record on the maximal information coefficient and equitability.

Although we appreciate Kinney and Atwal’s interest in equitability and maximal information coefficient (MIC), we believe they misrepresent our work. We highlight a few of our main objections below. Regarding our original paper (1), Kinney and Atwal (2) state “MIC is said to satisfy not just the heuristic notion of equitability, but also the mathematical criterion of R equitability,” the latter ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1505.02212  شماره 

صفحات  -

تاریخ انتشار 2015